\section{{\tt walk}, the code walker}

When you write advanced macros, or when you're analyzing some code to check for
some property, you often need to design a function that walks through an
arbitrary piece of code, and does complex stuff on it. Such a function is called
a code walker. Code walkers can be used for selective adjustments, e.g.
changing a function's {\tt return} statements into something else, or checking
that a loop performs no {\tt break}, up to pretty advanced transformations, such
as CPS transformation (a way to encode full continuations into a language that
doesn't support them natively; see Paul Graham's On Lisp for an accessible
description of how it works), lazy semantics...

Anyway, code walkers are tricky to write, can involve a lot of boilerplate code,
and are generally brittle. To ease things as much as possible, Metalua comes
with a walk library, which intends to accelerate code walker implementation.
Since code walking is intrinsically tricky, the lib won't magically make it
trivial, but at least it will save you a lot of time and code when compared to
writing walkers from scratch. Moreover, other people who took the time to
learn the walker generator's API will enter into your code much faster.

\subsection{Principles}

Code walking is about traversing a tree, first from root to leaves, then from
leaves back to the root. This tree is not uniform: some nodes are expressions,
some are statements, some are blocks; and each of these node kinds is
subdivided in several sub-cases (addition, numeric for loop...). The basic code
walker just goes from root to leaves and back to root without doing anything.
Then it's up to you to plug some action callbacks into that walker, so that it
does interesting things for you.

Without entering into the details of AST structure, here is a simplified version
of the walking algorithm, pretending to work on a generic tree:

\begin{verbatim}
function traverse(node)
   local down_result = down(node)
   if down_result ~= 'break' then 
      for c in children(node) do
         traverse(node)
      end
   end
   up(node)
end
\end{verbatim}

The algorithm behind 'walk' is almost as simple as the one above, except that
it's specialized for AST trees. You can essentially specialize it by providing
your own up() and down() functions. These visitor functions perform whatever
action you want on the AST; moreover, down() has the option to return 'break':
in that case, the sub-nodes are not traversed. It allows the user to shun parts
of the tree, or to provide his own special traversal method in the down()
visitor.

The real walker generator is only marginally more complex than that:
\begin{itemize}
\item It lets you define your up() and down(), and down() can return 'break' to
  cut the tree traversal; however, these up() and down() functions are
  specialized by node kind: there can be separate callbacks for expr nodes, stat
  nodes, block nodes.
\item There's also a binder() visitor for identifier binders. Binders are
  variables which declare a new local variable; you'll find them in nodes
  `Local, `Localrec, `Forin, `Fornum, `Function. The binders are visited just
  before the variable's scope begins, i.e. before traversing a loop or a
  function's body, after traversing a `Local's right-hand side, before
  traversing a `Localrec's right-hand side. \\ 
  Notice that separate up() and down() visitors wouldn't make sense for
  binders, since they're leave nodes.
\item Visitor functions don't only take the visited node as parameter: they also
  take the list of all expr, stat and block nodes above it, up to the AST's
  root. This allows a child node to act on its parent nodes.
\end{itemize}

\subsection{API}
There are three main tree walkers: {\tt walk.expr()}, {\tt walk.stat()} and {\tt
  walk.block()}, to walk through the corresponding kinds of ASTs. Each of these
walkers takes as parameters a table {\tt cfg} containing the various visitor
functions, and the AST to walk through. The configuration table {\tt cfg} can
contain fields:
\begin{itemize}
\item {\tt cfg.stat.down(node, parent, grandparent...)}, which applies when
  traversing a statement down, i.e. before its children nodes are parsed,
  can modify the tree and return {\tt nil} or {\tt'break'}. The way children
  are traversed is decided {\em after} the {\tt down()} visitor has been run:
  this point matters when the visitor modifies its children nodes.
\item {\tt cfg.stat.up(node, parent, grandparent...)}, which applies on the
  way back up. It is applied even if {\tt cfg.stat.down()} returned
  {\tt'break'}, but in that case, the children have not been (and will not be)
  traversed. 
\item {\tt cfg.expr.down()} and {\tt cfg.expr.up()}, which work just as their
  {\tt stat} equivalent, but apply to expression nodes.\\
  Notice that in Lua, function calls and method invocations can be used as
  statements as well as expressions: in such cases, they are visited only by
  the statement visitor, not by the expression visitor.
\item {\tt cfg.block.down()} and {\tt cfg.block.up()} do the same for statement
  blocks: loops, conditional and function bodies.
\item {\tt cfg.binder(identifier, id\_parent, id\_grandparent...)}: this
  is run on identifiers which create a new local variable, just before that
  variable's scope begins.
\end{itemize}

Moreover, there is a {\tt walk.guess(cfg, ast)} walker which tries to guess the
type of the AST it receives, and applies the appropriate walker. When an AST can
be either an expression or a statement (nodes {\tt`Call} and {\tt`Invoke}), it
is interpreted as an expression.

\subsection{Examples}

A bit of practice now. Let's build the simplest walker possible, that does
nothing:

\begin{verbatim}
cfg = { }
walker = |ast| walk.block(cfg, ast)
\end{verbatim}

Now, let's say we want to catch and remove all statement calls to function
assert(). This can be done by removing its tag and content: an empty list is
simply ignored in an AST. So we're only interested in `Call nodes, and within
these nodes, we want the function to be `Id 'assert'. All of this is only
relevant to stat nodes:

\begin{verbatim}
function cfg.stat.down (x)
   match x with
   | `Call{ `Id 'assert', ... } -> x.tag=nil; x <- { }
   | _ -> -- not interested in this node, do nothing
   end
end
\end{verbatim}

You'll almost always want to use the 'match' extension to implement visitors.
The imperative table overrider ({\tt x <- y} a.k.a. {\tt table.override(x, y)}
also often comes handy to modify an AST.

We'll now remove {\tt assert()} calls in non-statements; we cannot
replace an expression with nothing, so we'll simply replace them with
{\tt nil}:

\begin{verbatim}
function cfg.expr.down (x)
   match x with
   | `Call{ `Id 'assert', ... } -> x <- `Nil
   | _ -> -- not interested in this node, do nothing
   end
end
\end{verbatim}


Here's a remark for functional programmers: this API is very imperative; you
might cringe at seeing the `Call nodes transformed in-place. Well, I tend to
agree but it's generally counter-productive to work against the grain of the
wood: Lua is imperative at heart, and design attempts at doing this functionally
sucked more than approaches that embraced imperativeness. 

\paragraph{Cuts}
By making down() return 'break', you can prevent the traversal to go further
down. This might be either because you're not interested in the subtrees, or
because you want to traverse them in a special way. In that later case, just do
the traversal by yourself in the down() function, and cut the walking by
returning 'break', so that nodes aren't re-traversed by the default walking
algorithm. We'll see that in the next, more complex example of listing free
variables.

This example is exclusively there for demonstration purposes. For actual
work on identifiers that require awareness of their binding state, there
is a dedicated {\tt walk.id} library.

We'll progressively build a walker that gathers all global variables
used in a given AST. This involves keeping, at all times, a set of
the identifiers currently bound by a "local" declaration, by function
parameters, by for loop variables etc. Then, every time an identifier is
found in the AST, its presence is checked in the current set of bound
variables. If it isn't in it, then that's a free (global) identifier.

The first thing we'll need is a scope handling system: something that keeps
track of what identifiers are currently in scope. It must also allow to save the
current scope (e.g. before we enter a new block) and restore it afterwards (e.g.
after we leave the block). This is quite straightforward and unrelated to code
walking; here is the code:

\begin{Verbatim}[fontsize=\scriptsize]
require 'std'
require 'walk'

-{ extension 'match' }

--------------------------------------------------------------------------------
-- Scope handling: ':push()' saves the current scope, ':pop()'
-- restores the previously saved one. ':add(identifiers_list)' adds
-- identifiers to the current scope. Current scope is stored in
-- '.current', as a string->boolean hashtable.
--------------------------------------------------------------------------------

local scope = { }
scope.__index = scope

function scope:new()
   local ret = { current = { } }
   ret.stack = { ret.current }
   setmetatable (ret, self)
   return ret
end

function scope:push()
   table.insert (self.stack, table.shallow_copy (self.current))
end

function scope:pop()
   self.current = table.remove (self.stack)
end

function scope:add (vars)
   for id in values (vars) do
      match id with `Id{ x } -> self.current[x] = true end
   end
end
\end{Verbatim}

(There is an improved version of that class in library {\tt walk.scope}; cf.
its documentation for details).

Now let's start designing the walker. We'll keep a scope object up to date, as
well as a set of free identifiers found, gathered every time we find an `Id{ }
node. To slightly simplify matter, we'll consider that the AST represents a
block.

\begin{Verbatim}[fontsize=\scriptsize]
local function fv (term)
   local freevars = { }
   local scope    = scope:new()
   local cfg      = { expr  = { } }

   function cfg.expr.down(x)
      match x with
      | `Id{ name } -> if not scope.current[name] then freevars[name] = true end
      | _ -> -- pass
      end
   end

   walk.guess(cfg, term)
   return freevars
end
\end{Verbatim}

Since we don't ever add any identifier to the scope, this will just list all the
identifiers used in the AST, bound or not. Now let's start doing more
interesting things:

\begin{itemize}
\item We'll save the scope's state before entering a new block, and restore it
  when we leave it. That will be done by providing functions {\tt cfg.block.down()}
  and {\tt cfg.block.up()}. Saving and restoring will be performed by methods
  {\tt :push()} and {\tt :pop()}.
\item Whenever we find a {\tt local} declaration, we'll add the list of
  identifiers to the current scope, so that they won't be gathered when parsing
  the {\tt `Id} expression nodes. Notice that the identifiers declared by the
  'local' statement only start being valid after the statement, so we'll add
  them in the {\tt cfg.stat.up()} function rather than {\tt cfg.stat.down()}.
\end{itemize}

\begin{Verbatim}[fontsize=\scriptsize]
local cfg = { expr  = { },
              stat  = { },
              block = { } }
   [...]
   function cfg.stat.up(x)
      match x with
      | `Local{ vars, ... } -> scope:add(vars)
      | _ -> -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Create a separate scope for each block, close it when leaving.
   -----------------------------------------------------------------------------
   function cfg.block.down() scope:push() end
   function cfg.block.up()   scope:pop()  end  
\end{Verbatim}

This starts to be useful. We can also easily add the case for `Localrec{ } nodes
(the ones generated by {\tt "local function foo() ... end"}), where a variable
is already bound in the {\tt`Localrec} statement's right-hand side; so we do the
same as for {\tt`Local}, but we do it in the {\tt down()} function rather than
in the up() one.

We'll also take care of {\tt`Function}, {\tt`Forin} and {\tt`Fornum} nodes,
which introduce new bound identifiers as function parameters or loop variables.
This is quite straightforward; the only thing to take care of is to save the
scope before entering the function/loop body (in {\tt down()}), and restore it
when leaving (in {\tt up()}). The code becomes:

  
\begin{Verbatim}[fontsize=\scriptsize]
local function fv (term)
   local freevars = { }
   local scope    = scope:new()
   local cfg      = { expr  = { },
                      stat  = { },
                      block = { } }

   -----------------------------------------------------------------------------
   -- Check identifiers; add functions parameters to newly created scope.
   -----------------------------------------------------------------------------
   function cfg.expr.down(x)
      match x with
      | `Id{ name } -> if not scope.current[name] then freevars[name] = true end
      | `Function{ params, _ } -> scope:push(); scope:add (params)
      | _ -> -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Close the function scope opened by 'down()'.
   -----------------------------------------------------------------------------
   function cfg.expr.up(x)  
      match x with
      | `Function{ ... } -> scope:pop()
      | _ -> -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Create a new scope and register loop variable[s] in it
   -----------------------------------------------------------------------------
   function cfg.stat.down(x)
      match x with
      | `Forin{ vars, ... }    -> scope:push(); scope:add(vars)
      | `Fornum{ var, ... }    -> scope:push(); scope:add{var}
      | `Localrec{ vars, ... } -> scope:add(vars)
      | `Local{ ... }          -> -- pass
      | _ -> -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Close the scopes opened by 'up()'
   -----------------------------------------------------------------------------
   function cfg.stat.up(x)
      match x with
      | `Forin{ ... } | `Fornum{ ... } -> scope:pop()
      | `Local{ vars, ... }            -> scope:add(vars)
      | _ -> -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Create a separate scope for each block, close it when leaving.
   -----------------------------------------------------------------------------
   function cfg.block.down() scope:push() end
   function cfg.block.up()   scope:pop()  end

   walk.guess(cfg, term)
   return freevars
end
\end{Verbatim}

This is almost correct now. There's one last tricky point of Lua's semantics
that we need to address: in {\tt repeat foo until bar} loops, "bar" is included
in "foo"'s scope. For instance, if we write {\tt repeat local x=foo() until
  x>3}, the "x" in the condition is the local variable "x" declared inside the
body. This violates our way of handling block scopes: the scope must be kept
alive after the block is finished. We'll fix this by providing a custom walking
for the block inside `Repeat, and preventing the normal walking to happen by
returning 'break':

\begin{Verbatim}[fontsize=\scriptsize]
   -----------------------------------------------------------------------------
   -- Create a new scope and register loop variable[s] in it
   -----------------------------------------------------------------------------
   function cfg.stat.down(x)
      match x with
      | `Forin{ vars, ... }    -> scope:push(); scope:add(vars)
      | `Fornum{ var, ... }    -> scope:push(); scope:add{var}
      | `Localrec{ vars, ... } -> scope:add(vars)
      | `Local{ ... }          -> -- pass
      | `Repeat{ block, cond } -> -- 'cond' is in the scope of 'block'
         scope:push()
         for s in values (block) do walk.stat(cfg)(s) end -- no new scope
         walk.expr(cfg)(cond)
         scope:pop()
         return 'break' -- No automatic walking of subparts 'cond' and 'body'
      | _ -> -- pass
      end
   end
\end{Verbatim}

That's it, we've now got a full free variables lister, and have demonstrated
most APIs offered by the basic 'walk' library. If you want to walk through
identifiers in a scope-aware way, though, you'll want to look at the {\tt
  walk.id} library.

\subsection{Library {\tt walk.id}, the scope-aware walker}

This library walks AST to gather information about the identifiers in it. It
calls distinct visitor functions depending on whether an identifier is bound or
free; moreover, when an identifier is bound, the visitor also receives its
binder node as a parameter. For instance, in {\tt +\{function(x) print(x)
  end\}}, the bound identifier walker will be called on the \verb|+{x}| in the
\verb|print| call, and the visitor's second parameter will be the {\tt`Function}
node which created the local variable {\tt x}.

\paragraph{API}
The library is loaded with \verb|require 'walk.id'|. The walkers provided are:
\begin{itemize}
\item {\tt walk\_id.expr()};
\item {\tt walk\_id.stat()};
\item {\tt walk\_id.block()};
\item {\tt walk\_id.guess()}.
\end{itemize}

They take the same config tables as regular walkers, except they also
recognize the following entries:
\begin{itemize}
\item {\tt cfg.id.free(identifier, parent, grandparent...)}, which is run on
  free variables;
\item {\tt cfg.id.bound(identifier, binder, parent, grandparent...)}, which is
  run on bound variables. The statement or expression which created this bound
  veriable's scope is passed as a second parameter, before the parent nodes.
\end{itemize}

\paragraph{Examples}
Let's rewrite the free variables walker above, with the id walker:

\begin{Verbatim}[fontsize=\scriptsize]
function fv (term)
   local cfg = { id = { } }
   local freevars = { }
   function cfg.id.free(id)
      local id_name = id[1]
      freevars[id_name] = true
   end
   walk_id.guess (cfg, term)
   return freevars
end
\end{Verbatim}

Now, let's $\alpha$-rename all bound variables in a term. This is slightly
trickier than one could think: we need to first walk the whole tree, then
perform all the replacement. If we renamed binders as we went, they would stop
binding their variables, and something messy would happen. For instance, if we
took {\tt +\{function(x) print(x) end\}} and renamed the binder {\tt x} into
{\tt foo}, we'd then start walking the function body on the tree {\tt
  +\{function(foo) print(x) end\}}, where {\tt x} isn't bound anymore.

\begin{Verbatim}[fontsize=\scriptsize]
--------------------------------------------------------------------------------
-- bound_vars keeps a binder node -> old_name -> new_name dictionary.
-- It allows to associate all instances of an identifier together,
-- with the binder that created them
--------------------------------------------------------------------------------
local bound_vars    = { }

--------------------------------------------------------------------------------
-- local_renames will keep all identifier nodes to rename as keys,
-- and their new name as values. The renaming must happen after 
-- the whole tree has been visited, in order to avoid breaking captures.
--------------------------------------------------------------------------------
local local_renames = { }

--------------------------------------------------------------------------------
-- Associate a new name in bound_vars when a local variable is created.
--------------------------------------------------------------------------------
function cfg.binder (id, binder)
   local old_name         = id[1]
   local binder_table     = bound_vars[binder]
   if not binder_table then
      -- Create a new entry for this binder:
      binder_table        = { }
      bound_vars[binder]  = binder_table
   end
   local new_name         = mlp.gensym(old_name)[1] -- generate a new name
   binder_table[old_name] = new_name -- remember name for next instances
   local_renames[id]      = new_name -- add to the rename todo-list
end

--------------------------------------------------------------------------------
-- Add a bound variable the the rename todo-list
--------------------------------------------------------------------------------
function cfg.id.bound (id, binder)
   local old_name    = id[1]
   local new_name    = bound_vars[binder][old_name]
   local_renames[id] = new_name
end

-- walk the tree and fill laocal_renames:
walk_id.guess(cfg, ast)

-- perform the renaming actions:
for id, new_name in pairs(local_renames) do id[1] = new_name end
\end{Verbatim}

\subsection{Library {\tt walk.scope}, the scope helper}
This library allows to easily store data, in an AST walking function, in a scope
aware way. Cf. comments in the sources for details.
